Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hand written gelu derivative #480

Merged
merged 6 commits into from
Mar 4, 2023
Merged

Conversation

chengchingwen
Copy link
Member

No description provided.

@chengchingwen chengchingwen requested a review from mcabbott March 2, 2023 10:43
@chengchingwen chengchingwen changed the title Add hand written glue derivative Add hand written gelu derivative Mar 2, 2023
@CarloLucibello
Copy link
Member

you have some timings for it?

@chengchingwen
Copy link
Member Author

you have some timings for it?

before:

julia> CUDA.@time CUDA.@sync Flux.gradient((m, x)->sum(sin.(m(x))), gpu(Dense(512, 512, gelu)), gpu(randn(512, 128, 16)));
  0.058803 seconds (37.33 k CPU allocations: 16.095 MiB) (17 GPU allocations: 54.004 MiB, 0.69% memmgmt time)

julia> @btime CUDA.@sync Flux.gradient((m, x)->sum(sin.(m(x))), $(gpu(Dense(512, 512, gelu))), $(gpu(randn(512, 128, 16))));
  387.026 μs (523 allocations: 27.28 KiB)

after:

julia> CUDA.@time CUDA.@sync Flux.gradient((m, x)->sum(sin.(m(x))), gpu(Dense(512, 512, gelu)), gpu(randn(512, 128, 16)));
  0.062451 seconds (37.29 k CPU allocations: 16.076 MiB) (16 GPU allocations: 46.004 MiB, 0.61% memmgmt time)

julia> @btime CUDA.@sync Flux.gradient((m, x)->sum(sin.(m(x))), $(gpu(Dense(512, 512, gelu))), $(gpu(randn(512, 128, 16))));
  356.649 μs (482 allocations: 25.56 KiB)

@chengchingwen
Copy link
Member Author

@mcabbott Do you want to check the numerical accuracy? Or it is good to go?

@mcabbott
Copy link
Member

mcabbott commented Mar 4, 2023

Looks good, I don't think any of the gradients have been carefully checked to measure floating point accuracy.

Saves 2 copies by avoiding Zygote's broadcasting with Dual.

@chengchingwen chengchingwen merged commit 5a1c42c into FluxML:master Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants